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Abstract 

Variable-length block-coding schemes are investigated for discrete memoryless channels with 
ideal feedback under cost constraints. Upper and lower bounds are found for the minimum 
achievable probability of decoding error P ei mi n as a function of constraints R,V, and T on the 
transmission rate, average cost, and average block length respectively. For given R and V, the 
lower and upper bounds to the exponent — (ln_P e , m j n )/r are asymptotically equal as r — > oo. 
The resulting reliability function, hm=f-.oo( — hiP e m i n )/r, as a function of R and V, is concave 
in the pair (R, V) and generalizes the linear reliability function of Burnashev [2] to include 
cost constraints. The results are generalized to a class of discrete-time memoryless channels 
with arbitrary alphabets, including additive Gaussian noise channels with amplitude and power 
constraints. 

1 Introduction 

The information theoretic effect of feedback in communication has been studied since Shannon |16j 
showed in 1956 that feedback can not increase the capacity C of a discrete memoryless channel 
(DMC). At about the same time Elias [6] and Chang [15] gave examples showing that feedback 
could greatly simplify error correction at rates below capacity. 

Many of the known results about feedback communicatiorQ use block coding, i.e., coding in 
which messages are transmitted sequentially and each message is completely decoded and released 
to the destination before transmission of the next message begins. Block coding for feedback com- 
munication can be further separated into fixed-length and variable-length coding. The codewords 
in a fixed-length block code all have the same length, but, due to the feedback, the symbols in each 
codeword can depend on previous channel outputs as well as the choice of transmitted message. 
For variable-length block codes, the decoding time can also depend dynamically on the previously 
received symbols. We assume that the feedback is ideal, meaning that it is noiseless, instantaneous, 
and of unlimited capacity. Thus we can assume that all information available at the receiver is 



1 Non-block codes, with overlapping messages, sometimes have significant advantages over block coding. Sahai 
[11] gives an excellent discussion of non-block coding with feedback and compares it to block coding. We restrict 
ourselves here to block coding. 
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also available at the transmitter, and consequently the transmitter can determine when the receiver 
decodes each message. 

A widely used quality criterion for fixed- length block codes of a given rate is the error exponent, 
~ 1 " f>e , where P e is the probability of decoding error and r is the block length. Dobrushin [5] showed 
that the sphere-packing exponent (the well known upper bound to the error exponent without 
feedback) is also an upper bound for fixed-length block coding with feedback on symmetric DMC's. 
It has been long conjectured that this is also true for non-symmetric DMC's, but the current best 
upper bound, by Haroutunian [8j, is larger than the sphere packing bound in the non-symmetric 
case. 

Variable-length block coding allows the decoding to be delayed under unusually severe noise, 
thus usually providing a dramatic increase in error exponent. As explained later, the error exponent 
for a variable-length block code is defined as ~~ ^ Pe where r is the expected block length^ Similarly, 
the r atH is defined as =M- where M is the size of the message set. The reliability function, E(R), 
for a class of coding schemes on a given channel is defined as the asymptotic maximum achievable 
exponent, as r — > oo, for codes of rate greater than or equal to R. Burnashev [2] developed upper 
and lower bounds to E(R) for variable-length block codes on DMCs with ideal feedback. For 
DMC's in which all transition probabilities are positive, Burnashev's upper and lower bounds to 
E{R) are equal. The resulting function E(R) is linear, going from a positive constant at R = to 
at R = C. 

For DMC's in which not all transition probabilities are positiv^l, Burnashev implicitly showed 
that P e = is asymptotically achievable for all R < C (i.e., C is the zero-error capacity of variable- 
length block codesj for such DMC's.) 

The main objective of this paper is to generalize Burnashev's results to DMC's subject to a cost 
criterion. That is, a non- negative cost pk > is associated with each letter k of the channel input 
alphabet, {0, 1, ... , — 1}. It is assumeoH that py. = for at least one choice of k. The energy in a 
codeword X\, X2, ■ ■ ■ , X T , where X n is transmitted at time n, 1 < n < r and r is the decoding time, 
is defined to be S T = px x + • • • + px T ■ As explained more fully later, a variable-length block code 
is defined to satisfy an average cost (power) constraint V > if E [S T ] < E [r] V. We will find the 
corresponding reliability function for all V > 0. For all DMC's whose transition probabilities are 
all positive, this reliability function is a concave function of (R,V). If zero transition probabilities 
exist, then zero error probability can be achieved at all rates below the cost constrained capacity. 

Our interest in cost criteria for DMC's is motivated by the desire to separate the effect of cost 
constraints from that of infinite alphabet size, thus allowing a better understanding of channels such 
as additive Gaussian noise where these effects are combined. Pinsker [10] considered fixed-length 
codes for the discrete-time additive white Gaussian noise channel (AWGNC) with feedback. He 
showed that the sphere-packing exponent upper bounds the error exponent if a fixed upper bound 
is placed on the energy of each codeword, independent of the noise sample values. Schalkwijk |14j 

2 Error exponents can also be defined in various ways for non-block codes, but the interpretation does not corre- 
spond perfectly to block coding exponents; see Sahai [11] , 

3 Successive messages require independent identically distributed message transmission times, so this rate is the 
long-term rate at which message bits can be transferred to the receiver. 

Trivial outputs that cannot be reached from any input are excluded throughout. 

5 Thus zero-error capacity for variable- length codes with feedback can be strictly larger than zero-error capacity 
for fixed-length codes with feedback, which in turn can be strictly larger than zero-error capacity without feedback. 

6 The assumption that the minimum cost symbol has cost causes no loss of generality, since otherwise the 
minimum cost could be trivially subtracted from all symbol costs. 



2 



considered the same model but allowed the codeword energy to depend on the noise, subject to an 
average energy constraint. He developed a simple algorithm for which the error probability decays 
as a two- fold exponential of the block length (and thus also of the energy). This was an extension 
of joint work with Kailath [13] where the infinite bandwidth limit of the problem was considered. 
Kramer [9] later showed that the error probability could be made to decay n-fold exponentially for 
any n for the infinite bandwidth caseQ 

In the following section, we consider a class of variable-length block cods for DMC's with 
feedback and cost constraints. These generalize the Yamamoto and Itoh [18] codes to allow for 
cost constraints. We lower bound the achievable error exponent for these codes as a function of 
constraints R,V, and f on rate, average cost, and average block length respectively. 

In Section El we consider all possible variable-length block codes and derive a lower bound on 
t as a function of power constraint V, average error probability P e , and message-set size M. This 
is then converted into an upper bound on the error exponent over all codes of given R, V, and 
t. We show that as r — > oo, this upper bound coincides with the lower bound of Section [21 thus 
determining the reliability function in the presence of a cost constraint. 

In Section 0] the results are generalized to a broader class of discrete-time memoryless channels 
that includes AWGNC's with both power and amplitude constraints. 

2 Achievability: Asymptotically optimum codes 

2.1 Forward and feedback channel models and cost constraint 

The forward channel is assumed to be a DMC of positive capacity with input alphabet X = 
{0, . . . , 1^1 — 1} and output alphabet y = {0, . . . , | y\ — 1}. The input and output at time n are 
denoted by X n and Y n ; the n-tuples X\, . . . , X n and Y±, . . . ,Y n are denoted by X n and Y n . The 
feedback channel is ideal in the sense that it is discrete and noiseless with an arbitrarily large 
alphabet size \Z\ (although \Z\ = \y\ is sufficient). The symbol Z n sent from the receiver at time 
n can depend on Y n and is received without error at the transmitter after X n and before X n+ \ is 
sent. Z n denotes Z\, . . . , Z n . 

The forward DMC is defined by the \X\ by \y\ transition matrix {Pkj} where, for each time n, 
Pkj = P \Y n = j | X n = k] . The channel is memoryless in the sense that 

P [Y n \ X n ,Y n ~\ Z n ~ x \ = P [Y n \ X n ] . 

For each input letter k 6 X, there is a non- negative transmission cost > and at least one 
Pk is zero. The cost S T of transmitting a codeword of length r is the sum of the costs of the r 
symbols in the codeword. A cost constraint V means that E \S T \ < "PE [r] . We usually refer to 
V as a power constraint and to S T as energy. With this definition of power constraint, V can be 
seen to upper bound the long-term time-average cost per symbol over a long string of independent 
successive message transmissions. 

2.2 Fixed-length block codes with error-or-erasure decoding 

We begin with the slightly simpler problem of finding fixed-length block codes for an error-or- 
erasure decoder, i.e., a decoder which can either decode the message or produce an erasure symbol. 

7 In fact, we have recently shown that error probability can be made to decay n-fold exponentially in the finite 
bandwidth case where n is proportional to the block length. 
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The objective will be to minimize (or approximately minimize) the error probability while allowing 
the erasure probability to be much larger than the error probability but still be close to zero. 
In the following subsection, this error-and-erasure scheme will be converted into a variable-length 
block-coding scheme by retransmitting the erased messages. 

Consider a code of fixed-length £ containing two phases of length l\ and £2 respectively. The 
first phase uses a power constraint V\ and the second TV To meet an overall power constraint V, 
we requir^l l\P\ + £2^2 = £P ■ Define rj as £i/£, so that this power constraint becomes 

V = r ] V 1 + (l- rf)V 2 

Phase 1 consists of a conventional block code without feedback, operating incrementally close to 
the capacity C(7 ? i) of the channel subject to constraint T^i, 



0: £ fc 0kPk<Pl l^-m'Pm^mj 



Here and throughout, <fi is assumed to be a probability assignment, i.e., 4>k > for each k and 
^u<t>k = 1. The conventional coding theorem for a constrained DMC with fixed block length and 
no feedback is as follows^ for any 5\ > 0, there is an e\(5\) > such that, for all large enough £\, 
codes of block length i\ exist with M > e ^i c (.^)-^] codewords, each of energy at most l\P\ and 
each with error probability upper bounded by 

Pel < exp-4ei((5i). 

Using such a code in phase 1, the decoder makes a tentative decision at the end of phase 1. The 
transmitter (knowing the decision via feedback) then sends a binary codeword, for 'accept' and 
xr for 'reject' in phase 2. Let Pra be the probability that the receiver decodes given that xr 
is sent. Similarly, Par is the probability of decoding xr given x^. 

If xa is decoded, the receiver gives its tentative decision from phase 1 to the user and the 
overall probability of error P e satisfies P e < Pra- If x/j is decoded, an erasure is released and the 
probability of erasure P r satisfies P r < Par + Pel ■ Assume for now that the power constraint may 
be violated by an incrementally small amount. Thus we choose x^ to satisfy the constraint, and 
choose xr arbitrarily since it is rarely used. We bound — In Pra by the divergence between the 
output distribution conditional on xa and the output distribution conditional on xr. 

To be more explicit, define the maximum single-letter divergence for the input letter k as 

Dk— max Pkj m 



Note that if P m j = for some channel transition, then = 00 for each k such that P^j > 0. We 
will see in subsection 12.51 that this leads to error free codes at rates below capacity. In the following 
subsection, we consider only channels for which P^j > for all k S X and j € y. 



We could equally well constrain V\,Vi to satisfy i\P\ +I2P2 < iV, but since these are all inequality constraints, 
this would simply add an extra degree of freedom into the problem. 
9 See, for example, Theorem 7.3.2 in [7] 
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2.2.1 Error-and-erasure decoding with all P^j > 



Assume that Pjy > for all k E X and j 6 J and, for each k G X, let be an input letter m 



maximizing ^ • Pfcj In If X-A contains o3fc£2 occurrences of letter & and x r is chosen to contain 
the letter whenever x^ contains k, then the following minor variation of Stein's lemma results 10 !: 
for any 5 2 > 0, there is an 62(^2) > such that 



Pra < exp 



-£ 2 <j> k D h + £ 2 b 2 



Par < exp [-£ 2 e 2 (S 2 )) 



(2) 
(3) 



From ([2]), we want to choose x^ to maximize subject to the power constraint. Thus, for 

a power constraint V 2 in phase 2, define Y)(V 2 ) as 



D(P 2 ) 



max 



(4) 



The function D("P) in (j4]) is the maximum of a linear function of (j> over linear constraints. As 
illustrated in Figure [H D("P) is piecewise linear, non-decreasing, and concave in its domain of 
definition, V > 0. Choosing the phase 2 codewords and x# according to this maximization, (|2|) 
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PO = pi P2 PZ PA P5 P6 = Pmax 



Figure 1: The function T)(V) for a channel satisfying Pjy > for all fc € A" and j G X The maximum 
single-letter divergences are also shown. For convenience, the inputs are ordered in terms of cost. For 
any given V, D(P) can be achieved with at most 2 positive 4>k- 



becomes 

PRA<exp[-£ 2 T>(T 2 )+hS 2 ] (5) 

The power constraint V 2 is then satisfied by x^. The power in x^ (whose probability of usage van- 
ishes exponentially with £ 2 ) can be upper bounded by p max - The preceding results are summarized 
in the following lemma. 

Lemma 1 For allV\ > 0, V 2 > 0, < r] < 1, Si > 0, S 2 > 0, and all sufficiently large £, there is an 
error-and-erasure code with M > exp {r]£[C(Vi) — Si]} such that, for each message, the probability 
of error P e , the probability of erasure P r , and the expected energy E [S] satisfy 

Pe < 
Pr < 

E[«S] < 

10 This can be derived, for example, by starting with Theorem 5 in [4] and specializing to the case of asymptotically 
small s. 



exp{-(l-r,)e[B(V 2 )-5 2 )}} (6) 

e -»jfci(«i) +e -(l-»7)<6aC4») j (7) 

i[7]Vl + (1 _ V ) V2 + Pm&x e-^^)] ( 8 ) 
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2.3 Variable-length block codes; all P^ > 



The above error-or-erasure code can form the basis of a variable-length block code with ideal 
feedback. As in Yamamoto and Itoh |18| . the transmitter observes each erasure via the feedback 
and repeats the original message until it is accepted. For simplicity, we assume that when a message 
is repeated, the receiver ignores the previous received symbols and uses the same decoding algorithm 
as before. Since an error occurs independently after each repetition of the fixed length codeword, 
the overall error probability satisfies 

Pe < eX P {-(! " V)t\P(V2)-t2]} ■ 

The duration r of a block is I times the number of error-or-erasure tries until acceptance, so 
E [t] = £/(l — P r ). The coefficient 1/(1 — P r ) goes to 1 with increasing I and thus can be absorbed 
into the arbitrary term 82 for sufficiently large i. Similarly i can be replaced with f = E [r] , yielding 
P e <exp{-T{l-r,)[B(T2)-d 2 ]}. 

Similarly the expected energy E [S T ] over the entire transmission satisfies E [<S r ] < E [S] /(l — 
P r ). Finally, using ([8]), the average power for each codeword is 

< V-Pl + (1 - V)V2 + Anaxe-^ 

E [T\ 

The following lemma summarizes these results 

Lemma 2 Assume ideal feedback for a DMC with all Pkj > 0. For all 77 6 (0,1), "Pi > 0,7*2 > 
0, 8 > 0, and sufficiently large f, there is a variable-length block code with M > exp{T[r]C(Vi) — 5]} 
messages, each using average power at most r(P\ + (1 — f])Vi + 8, and each with error probability 

P e < exp{-T[(l - V )-D(V 2 ) - 5]} (9) 



2.4 Optimization of the bound; all P k j > 

Lemma [2] can be interpreted as providing a nominal rate of transmission, R = i]C(V\), a nominal 
power constraint, V = r(P\ + (l — r))V 2 , and a nominal exponent of error probability, (1 — rj)T) (7 3 2)- 
We have demonstrated the existence of variable-length block codes for which the actual average 
rate, power, and exponent approach these values arbitrarily closely as r becomes large. 

For any given V and R satisfying^ < R < C('P), we now maximize the exponent (1— rj)T)(7- > 2) 
over < rj < 1, V\ > 0, and V2 > 0, subject to the constraints R = ryC('Pi) and T]Vi+(l— tffPi = V. 
Our strategy, for a given 77, will be to use R = r]C(Vi) to specify V\ and then use rfP\ J r(\—r\)V 2 = V 
to specify V2, which is constrained by V2 > 0. Satisfying these constraints will put some constraints 
on 77, and the exponent will then be a function of R, V , and 77. The maximization then reduces to 
a maximization over the single constrained parameter 77. 

The constraints on r\ and the ensuing maximization depend on the properties of the capacity 
function C(V) illustrated in Figure [2j As can be seen from (JTJ) and visualized in Figure [21 the 
function C(V) is non-negative, concave, continuous, and non-decreasing for all V > 0. It is strictly 
increasing for < V < V* where V* is the smallest V for which C(V) = C*, the unconstrained 
channel capacity. This suggests that V\ is determined from rj. The following two lemmas, which 
are proven in the appendix, make this precise. 

n The interesting special case where R = C(V) is discussed in the following subsection. 
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V V* V v o p V V* 

(a) (b) (c) 

Figure 2: Typical capacity functions. Parts a and b illustrate that C(0) can be either or positive. Part c 
illustrates an important special case where C(x) is linear from to f3 > where (3 is defined as the largest 
x for which C(y)/y = C(x)/x for all y € (0, x\. 



Lemma 3 For any R,V such that < R < C(V), the equation r]C(V/r]) = R has a unique 

C(0)b 



solution for r/ € (0, 1); that solution, say 7) Rrp , satisfies f]* Rrp € [^k, 



The solution, rf(R,V) can also be expressed explicitly as r) RV = y-^r/v) ' wnere ^ is the 
inverse of the function T(x) = C(x)/x taken over the domain x > (5, where (3 is the largest x for 
which T(x) = r(0). 

For any < R < C(V, define 



1r,v — 



R 

rin-r,, min 1, „, . 

1 ' C(0) 



(10) 



Lemma 4 For any R, V such that < R < C(V) and for any r\ £ 1r,v> the following properties 
hold: 

• There is a unique V\ G [0, T 3 *] such that R = rjC(Pi). 

• The corresponding Vi, defined by V = r(P\ + (1 — T})T > 2, is nonnegative. 
There is no Vi, V2 > such that R = nC(P\) and V = rfPi + {l—rj)T > 2 f or V Zr,V- 

Thus for any (R, V) pair such that < R < C(V), and for any rj S Irt>i t ne nominal exponent 

is 



1 — 7) 

The following lemma, proved in the appendix, shows that E(R, V, rj) is concave. 

Lemma 5 The set of points (R,V,7)) such that < R < C(V) and n 6 Tr,v is convex. The 
function E{R, V, rj) is concave over this domain. 

We next maximize the exponent E(R, V, rj) over r) £ Ir-p, 

E ( R,V)- supd-^D f-f-'W"' ) (12) 

This is simply a concave maximization over an interval. The resulting function, E(R, V) is then 
also concave as a function of {R, 7'),and thus also as a function of R for any given V. This is 
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illustrated in Figure [3j It can be shown that E(R,V) is strictly decreasing in R from D("P) at 
R = 0. 

One can extend the definition of E(R,V) to R = C(V), for any V as 

E(C(V),V)= lim E(C(V)-S,V) (13) 

The following theorem results from using E(R,V) in Lemma [2j 



D(p max ' 






D(P) 








E(R, 








C* 



Figure 3: A typical E(R,V) curve. The figure illustrates that E(R,V), as a function of R for fixed 
V, is concave, decreasing, and bounded. 

Theorem 1 Assume ideal feedback for a DMC with all P^j > 0. Then for all < R < C('P) ; 
all positive 5, and all sufficiently large integer £, there is a variable-length block code of expected 
length t,£<t<1+1 with M > exp[r(i? — 5)] messages such that for each message, 9 € Ai the 
probability of error P e (0) and the expected energy E [5 r (#)] satisfy 

P e {8) < exp{-T[E(R,V) - 5]} (14) 
E[Sr(0)] < (V + /we^ e(5) )r, (15) 

where e(5) > for each 5 > 0. Furthermore, the probability that the codeword length exceeds I is at 
most 5. 

Theorem Q] shows that the exponent E(R, V) can be asymptotically achieved by this particular class 
of variable-length block codes. The converse in the next section will show that no variabledength 
block code can do better asymptotically, i.e., that E(R,V) is the reliability function for constrained 
variable-length block codes. 

Theorem 1 also shows that these codes are almost fixed-length block codes, deviating from 
fixed length only with arbitrarily small probability. It is also possible to analyze the queuing delay 
for this class of codes. Note that if the source bits arrive equally spaced in time, then, even for a 
fixed-length block code, bits are delayed waiting for the next block and additionally delayed waiting 
for the block to be received and decoded. The additional delay, for variable-length block codes, 
is the queuing delay of waiting blocks while earlier blocks are retransmitted. At any R < C, the 
probability of retransmission decreases exponentially (albeit with a small exponent) with r, so it is 
not surprising that the expected additional delay due to retransmissions goes to with increasing 
r. We have shown that this indeed happens. 
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For < R < C(V) and V > 0, (fT5|) can be simplified by absorbing the term p majX e~ re ^ into 
the 5 of (|14p . This cannot be done for V = since the constraint E [<S r (0)] < Vt for all 9 reduces 
to the unconstrained case where only zero-cost inputs are used. In (|15p . on the other hand, we are 
using a reject messages of positive power with asymptotically vanishing probability, with increasing 
f. 

The requirement of ideal feedback can be relaxed to that of a noiseless feedback link of capacity 
Cfb > C and finite delay T by using a modification of the error-and erasure scheme first suggested 
by §im§ek and Sahai[l2] for unconstrained channels. For phase 1, the message is divided into equal 
length sub-messages which are separately encoded at a rate close to capacity and sent one after the 
other. A temporary decision about each sub-message is made at the receiver and sent reliably to 
the transmitter with a delay equal to T plus the sub-message transmission time. In phase 2, the 
entire message is rejected if any sub-message was in error and otherwise it is accepted. A single bit 
of feedback is required for phase 2, and it can be shown that the various delays become amortized 
over the entire message transmission as M — > oo. 



2.4.1 Channels for which E(R,V) > for R = C{V) 



In certain cases E(C(V),V) as defined in equation (|13|) is strictly positiveo We start with a simple 
example of this phenomenon and then delineate the cases where it is possible. 

Example 1: BSC with extra free symbol: Consider a binary symmetric channel (BSC) in 
which each input symbol has unit cost. There is an additional cost-free symbol that is completely 
noisy. That is, the transition probabilities and costs are as follows: 



(16) 





"1/2 1/2" 




"0" 


Pkj = 


a I— a 


Pk = 


1 




I— a a 




1 



where < a < 1/2. Letting Crsc 
binary entropy, it can be seen thatl 13 ! 



In 2 — f)(x) where t)(x) = — (1— a) ln(l— a) — a In a is the 



C(V) 



for < V < 1 
for V > 1. 



^Cbsc 
Cbsc 

Assume a power constraint V = 1/2 and rate R = C(V) = ^Cbsc- One half unit of energy is 
required in phase 1 and the maximum interval for phase 2 is provided by choosing V\ = 1 and 
r\ = 0.5. Thus we transmit at the unconstrained capacity during phase 1 and transmit at zero 
power for phase 2. In phase 2, the zero symbol is used for x^ and either of the unit cost symbols 
is used for x#. This yields an exponent E(R,V) = \Dq = j hi( 40,(1-0,) )> which is clearly positive. 
What is happening is that zero nominal power (i.e., zero power except for the rare transmission of 
reject symbols) provides a positive exponent. 

Now consider E(R,V) for this channel for any R < C(V) and V < 1. It can be seen that 
Vr-p = R/Cbsc- It can also be seen, as above, that this value of 77, corresponding to V\ = 1, 
maximizes E(R, V, rj). Thus 

V - n R 
E(R,V) = (1 -77)D(- -) where rj 



1 



V 



'BSC 



12 This result contains the epsilons and deltas of Theorem [T] and thus does not assert reliable transmission 'at 
capacity', but the existence of a positive exponent at R = C(V) is still very surprising. 

13 The linearity of the capacity function here follows from the fact that the output probabilities in {TJ are the same 
for the free symbol and the equiprobable use of the BSC symbols. 
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From flU), D(x) = D + (£>i - D )x for < x < 1, so 



£(ii,7>) = (1 - ??)A) + - 17) (A - A)) where 77 = 

<^BSC 

This is illustrated in Figure HI Consider the limit of E{R,V) as R approaches C(V) from below 
and thus 77 approaches C(7 7 )/Cbsc = T 7 - This limit is (1 — V)Dq and is asymptotically achievable 
at it! = C(V) by transmitting with power 1 in phase 1 and then transmitting with a nominal power 
equal to during phase 2. 




~R Cbsc 



Figure 4: E(R,V) for a BSC with a zero cost noise symbol. For V < 1, the exponent decreases 
linearly to a positive value at capacity. 

More generally, any DMC for which [3 > has the property that if V < (3, then E(R, V) is 
positive at R = C(V) and affine in R for < R < C(V). 



2.4.2 Alternative approaches to finding E(R,V) 

The reliability function E(R, V) is expressed in (|12p as an optimization over E(R, V, rj) and as 
such involves calculating C(V\) and 0(^2) as subproblems. An alternative that might be more 
convenient numerically is to express E(R, V) directly as a concave optimization over the input 
probabilities in phase 1 and 2 subject to the constraints corresponding to a given R and V . 

Another alternative, which is more interesting conceptually, is to investigate how the phase 1 
and phase 2 powers must be related. Consider the equivalent problem of finding the minimal power 
V required for a given rate R and exponent E. We will derive a necessary condition for V\ > 0, 
Vi > 0, and < 77 < 1 to achieve this minimum power. First consider the special case in which 
C(V) is continuously differentiable for V > and let A\ = r(P\ be the phase 1 power amortized 
over both phases. The partial derivative of A\ with respect to rj for a given R = rjC(Ai/r]) is then 



&4i 



drj 



R d[ V C(A 1 / v )]/dA l C>{Vi 



Geometrically, this is the horizontal axis intercept of the tangent to C(-) at V\. 

In the general case, C('Pi) can have slope discontinuities at particular values of V\\ because of 
these discontinuities the left and right derivatives and the corresponding tangents and intercepts 
becomes different from each other (see Figure ED . 

In the same way, let A<i = (1 — r\)Vi. Then, holding the exponent E fixed, 



dA 2 



drj 



d[(l- V )B(A 2 /[(l- r,)}/d V _ p2 | D(P 2 ) (lg) 



d[[(l- v )T>(A 2 /[(l- v ))]/dA 2 * D'(7> 2 
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CpP) 




Figure 5: f^| fl is the derivative corresponding to negative change in r\ (positive change in V\) and 
§4^1^ is the derivative corresponding to positive change in r] (negative change in V\). 



This is the negative of the horizontal axis intercept of the tangent to D(-) at V 2 - At points of slope 
discontinuity in D(-), this must be replaced with 



dr] 



-V 2 + 



D(7> 2 ) 



dA 2 



d[D(V 2 )}/d[V 2 



Finally, the overall power constraint is V 



dV 



dAi 



R.E 



R 9 V + 



A x + A 2 , so 
dV 

i 

E 



dr)~ 



-V 2 + 



dA x 



D(7> 2 ) 



R,E 



d[B(V 2 )]/d[P^ 2 



R dr J~ 



For Pi, V 2 , and r] to minimize V for fixed R, E, it is necessary that -§^f\ re > (i.e., that an 
incremental increase in r] does not reduce V) and that -§^r\ RE < (i.e., that an incremental 
decrease in r] does not reduce V). Geometrically, what this says is that the horizontal intercept 
of the tangent to C(-) at V\, which in general is the interval [|^t|_r, §^tI-rL must overlap with 

the horizontal intercept of the tangent to D(-) at V 2 , i.e., with the interval [— §^r| B , — f^p| B ]- 
Note that these intervals reduce to single points in the absence of slope discontinuities in C(Vi) or 
DCP 2 ). 

It is surprising that these conditions do not involve 77. The following example shows how these 
conditions can be used. 

Example 2: Combined 4-input symmetric channel, BSC, and free symbol: Consider 
the following DMC with seven input letters and four output letters; 
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where e = 1/75, 5 = 1/100. 

C(V) is piecewise linear for the same reason as in the previous example; C(V) and D('P) are 
given in Figure [6l 

The above necessary conditions on V\ and V 2 imply 



Vi = l & 1 > V 2 > 
4 > V\ > 1 V 2 = l 

p 1 = 4 P 2 > 1 



(19) 
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Using these conditions and the set constraint, we can calculate E(R, V) for any given V; the 
solutions for V = 1 and V = 5 are given in Figure [7J 




0.298 



0.587 



0.885 



Figure 7: Reliability function for V = 0.5 and V = 2.5; each straight line segment is characterized 
by either constant V\ or constant Vi according to (fT9|) . 



2.5 Zero-error capacity; Channels with at least one P^j = 

The form of E(R,V) relies heavily on the assumption that Pjy > for all k,j. To see why, assume 
Pmj = for some m,j. Since all outputs are assumed to be reachable, Pkj must be strictly positive 
for some k and that same j. In this case = oo. Suppose that the 'accept' codeword of section 
2 uses all k's, the 'reject' message all m's, and that the receiver decodes 'accept' only if it receives 
one or more j's. In this case, no errors can ever occur for the corresponding variable-length block 
code. 

Asymptotically, phase 2 can occupy a negligible portion of the block, say ln£ of £ symbols. Then 
for any S > 0, and all large enough block lengths £, an error-and-erasure code exists with M > 
e (£-\ne)(C(v)-S) ^ p e = Q) and p r < e -(<-bi/) e (*) + ^n(i-q) and expec ted energy E [S e ] < £P+p max ]nl. 

After a little analysis the following theorem results: 

Theorem 2 Assume ideal feedback for a DMC with at least one P^j = 0. Then for all < R < 
C(V), all positive 5, and all sufficiently large T, there is a variable-length block code satisfying 

M > e Y{R - 5) P e = B[S t ]<Vt + 2/w In r 
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3 The converse: relating r andP e 

We have established an upper bound on P e for given rate R, power V, and expected block length 
t by developing and analyzing a particular class of variable-length block codes. Here we develop 
a lower bound to P e which, for large enough r, is valid for all variable length block codes. The 
lower bound uses the idea of a two phase analysis, but, as will be seen, this does not restrict the 
encoding or decoding. We start by finding a lower bound on the expected time E [ri] spent in the 
first phase and a related lower bound on the expected time E [r — t\\ spent in the second phase. 

The analysis is a simplification and generalization of Burnashev [2] and is based on the evo- 
lution at each time n of the conditional message entropy, conditioned on the observations at the 
receiver. The first phase is the interval until this conditional entropy drops from InM to some 
fixed intermediate value, taken here to be 1. The second phase is the interval until this conditional 
entropy further drops to meet the constraint on error probability; Fano's inequality is used to link 
the conditional entropy to the error probability. In the first phase we create a stochastic sequence 
related to the decrease in conditional entropy at each instant n, and in the second phase we create 
a stochastic sequence related to the decrease in the logarithm of the conditional entropy. 

Establishing this lower bound to P e is more involved than the upper bound to P e , since the 
lower bound must apply to all variable-length block codes. We start with a more precise definition 
of variable-length block codes. After that we bound the expected change of conditional entropy 
and its logarithm, first in one time unit and second between two stopping times. Then these are 
used to lower bound the probability of error. The resulting upper bound on the reliability function 
agrees with the lower bound in section [2l 



3.1 Mathematical preliminaries and Fano's inequality 

In a variable-length block code, the transmitter is assumed to initially receive one of M equiprobable 
messages from the set A4 = {1, . . . ,M}. It transmits successive channel symbols about that 
message, say message 0, until the receiver makes a decision and releases the decoded message to 
the user. The time of this decision is a random variable denoted by r. We assume throughout that 
E [t] = t < oo, since otherwise any desired lower bound to r is obviously satisfied. 

Given noiseless feedback, we can restrict our attention to encoding algorithms in which each 
input symbol X n is a deterministic function of message and feedback I 14 l 

X n = X n (6,Z n - 1 ) VZ n ~\V9. (20) 

The entire observation of the receiver up to time n, including Y n and any additional random choices, 
can be summarized by the u-field J- n generated by these random variables. The nested sequence 
of J- n 's is called a filtration T . 

At each time n, depending on the realization f n of <r-field J- n , the receiver has an a posteriori 
probability Pi(f n ) for each i in M. The corresponding conditional entropy of the message, given 
J- n , is a random variable 7ij^ n , measurable in T n . Its sample value for any realization f n € J- n , is 
given by: 

M 

H fn = H(6 | T n = f n ) = - 5>i(0 ln ft (f n ). 

i=l 

14 This allows the receiver to feed back not only the channel outputs but also some random choices. Random choices 
at the transmitter provide no added generality since those choices (for all possible 6) could be made earlier at the 
receiver with no loss of performance. 
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A decoding algorithm includes a decision rule about continuing or stopping the communication, 
depending on the observations up to that time, i.e., a Markov stopping time with respect to the 
filtration T . The message is also decoded at this stopping time. In order to define the various 
random variables at all times n > 1, rather than only times up to the stopping time, we will 
assume that X n (9,Z n ~ l ) is equal to some given zero-cost symbol for all n > r and all 6. Thus 
S n = S T for all n > r. Thus if a variable- length block code (henceforth simply called a code) 
satisfies a cost constraint E [S T ] < Vr, then E [S T ] < oo and E [S n ] < oo for all n. 

Fano's inequality can be applied for each f T to upper bound the conditional entropy Hf T in 
terms of the error probability of the decoding at f T . Taking the expectatioii^l of these inequalities 
over f r G J- T , and using the concavity of the binary entropy, f)(x), the expected value of E \Hr T ] 
can be upper bounded at the decoding time by 

E[Hjr T ] < &(P B ) + P e ln(M-l) 

< P e (lnM - lnP e + 1). (21) 

This suggests that the conditional entropy is usually very small at the decoding time, motivating a 
focus on how fast the logarithm of the entropy changes in the second phase of the analysis below. 

3.2 Bounds on the change of conditional entropy 

For any DMC, any code, and any V > 0, define the stochastic sequence {V n ;n > 0} as 

V n 4 U Tn + nC(V) + 7c(E [S n \ F n ] ~ nV) (22) 

where 7^ > is the Lagrange multiplier for the cost constraint in the maximization of C(V) over 
input probabilities in ([T|). The random variable V n will be used to bound the entropy and energy 
changes in phase 1. 

Lemma 6 For any DMC, any code, and any V > 0, the sequence {V n ;n > 0} is a submartingale, 
i.e., 

E [|V n |] < 00 and E \V n+1 \ > V n for all n > 0. 

This lemma applies to all codes, whether or not they have a cost constraint equal to the V in the 
definition of V n . Proofs of Lemmas [6] to [TO] are given in the appendix. 

The following two lemmas develop another submartingale based on the log entropy. 

Lemma 7 For any DMC with all P^j > 0, any code, and any n > 0, 

E [In U Tn - In H Tn+1 \ F n ] < E [D Xn+1 1 T n \ (23) 

Another stochastic sequence, {W n ;n > 0} is now defined^! that combines the changes in log 
entropy and cost. 

W n 4 In Ur n +nD(V)+ 1 l (E [S n \F n )-nV), (24) 

where 7^ > is the Lagrange multiplier for the cost constraint in the maximization of T)(T > ) over 
input probabilities in @. 

15 The facts that \Hf t \ < InM and \P e \ < 1, combined with Lebesgue's dominated convergence theorem, allow us 
to interchange the limit and expectation here. 

16 Note that W n can be -co for DMC's in which P^j = for one or more transitions, but W„ will be used only 
for DMC's in which all P kj > 0. 
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Lemma 8 For any DMC with all Pkj > 0, for any code, and for any V , < V < oo, the sequence 
{W n ;n > 0} is a submartingale, i.e., 

E [|W„|] < oo and E [W n+ i| T n ] > W„ for all n > 0. 

3.3 Measuring Time with Submartingales 

The following lemmas are used to lower bound the expected stopping times for phases 1 and 2 in 
terms of {V n ; n > 0} and {W n ; n > 0} 

Lemma 9 For any DMC and any code, if a stopping time t\ satisfies 

E [n] < oo and E [S n ] < VB [n] 

then 

C(P)E[n]>E[^ -^ T J (25) 

Lemma 10 For any DMC with all Pkj > and any code, if a pair of stopping times (t\ < T2) 
satisfies, 

E [r 2 ] < 00 and E [S T2 - S Tl ] < VE [r 2 - n] 

then 

D(P)E [r 2 - n] > E [lnHr T1 - hiH^] (26) 

The bounds asserted by these lemmas are tight in the sense that when they are used with the 
stopping times to be specified later, they will show that E(R, V) in (|12p is an upper bound on the 
reliability function. 

3.4 Lower bounding r for DMC's with all > 

We now derive lower bounds on the expected decoding time for any variable-length block code with 
M equiprobable messages, subject to a given cost constraint V and a required probability of error 
P e . The first result is simply an explicit statement of the well-known impossibility of transmitting 
reliably at rates above C(V). 

Theorem 3 For any DMC, any code with M > 2 equiprobable messages, any V > 0, and any 
required error probability P e > and cost constraint V , the expected decoding time satisfies 

„ r . lnM-PeHnM-lnPe + l) 

M * -^clv) — ( 27 ) 



Proof: From (HH), E [Hf T ] < P e {lnM - lnP e + 1). Thus, since H fo = InM, §27$ results from (|2"5} 
with n = r. QED 
This result is valid both for the case where all P^j > and the zero-error case where some 
Pkj = 0. In the zero-error case, we already know that P e = is asymptotically achievable for 
R < C(V), so our remaining task is to show that E(R,V) is an upper bound as well as a lower 
bound to the reliability function in the case where R < C(V) and all Pkj > 0. 
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3.4.1 Lower bounding r for DMC's with all Pkj > 

The main issue in this lower bound is finding an intermediate Markov stopping time t\ which 
will divide the message transmission interval into two disjoint phases^] such that the duration 
of each can be lower bounded by Lemmas [9] and [10] respectively. Consider the stopping time 
t\ = min{n | Hj? n < 1} in filtration T . This does not quite work as an intermediate stopping time, 
since a variable- length code could in principle occasionally decode before 1~i^ n < 1. Instead we 
use T\ = min(r, t±) to define the end of the first phase. This is also a Markov stopping time, and 
< 7~i < t, so this is a well defined intermediate time for all codes. 

We now apply Lemma [9] to r\ . Let E [S Tl ] be the expected energy used by any given code in 

this first phase and let V\ = E[ri ] • Then (j25|) becomes 

E[Hf -Ht ti ] <C(7>i)E[ti]. (28) 

We first find an upper bound to E [Wjf t J • By definition of t\, TCp ti < 1, but T~Cf t might be greater 
than 1 if Hf r > 1. Thus we can upper bound E [Hr T1 ] by 

E [Hr Tl } < 1 + P [Hr T > 1] E [Hj^ \ H Tt > l] 

< 1 + P \H Tt > 1] In M (29) 

< 1 + E [Wjr T ] In M (30) 

< l + P e (lnM-lnP e + l)lnM, (31) 

where in (|29p we upper bounded E [Hr T1 \ Ut t > l] by InM, the maximum entropy for any ensem- 
ble of M elements. We used the Markov inequality in (130p and then (12ip in (I3ip . 

Since the messages are a priori equiprobable, TLf = InM, so substituting this and (|3ip into 

(EH), 

lnM[l -P e (lnM-lnP e + l) - rnrrl , s 

E [n] > l - e{ . (32) 

As shown later, the term in brackets essentially approaches 1 as P e — > and thus E [n] is approx- 
imately lower bounded by (lnM)/E [C(Pi)]. 

Next we find the expected time E [r — n] spent in phase 2. Here we use (|26|) from Lemma [TU| 
with the initial time r, chosen to be n and the final time Tf chosen to be r. Let E [<S T — 5 Tl ] be 

e[»s —S 1 

the expected energy used by the given code in this second phase and let V2 = e [ t--ti ] ' Then 

E [In U Ttx - In H Tt \ < D(P 2 )E [r - n] (33) 

We lower-bound E [in Hf T1 - ln7ijr r ] by upper-bounding E [In Hp T ] and lower-bounding E [in TLr T ] . 
By Jensen's inequality, E [In TCp T ] < In E [Hf r ] , so from (J2TJ), 



E [lnWj7 T ] < ln[P e (lnM - lnP e + 1)]. (34) 
To lower-bound E [in Wjr Tl ] , we use the following lemma, 



17 There is a nice intuitive relation between these two phases used in the converse and the two phases used in the 
variable- length block codes of Section [231 since in each case the first phase deals with a large sea of messages and the 
second deals essentially with a binary hypothesis. When an error-and-erasure codeword is repeated, however, phase 
1 as defined here could end during any one of those repetitions. 
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Lemma 11 For any DMC with all Pkj > 0, any code, and any n > 0, 

In U Tn - In Hr n+1 < max In ^- 4 F (35) 
^^ra,j * mj 

Since H^ Ti _ 1 > 1, i.e., lnH^ Ti _ 1 > 0, the lemma implies that lnTC^ Ti > — F. 
Substituting this and (f34"j) into (f33|) . 



. -lnPe-F-lnflnM-lnPe + ll , . 

E[r-n]> = ^ ^ (36) 

As shown later, the numerator is essentially (— lnP e ) in the limit of small P e . Now we can find a 
lower bound on (— lnP e )/f , for codes of rate (In M)/r, using the above result. 

Theorem 4 Assume a DMC with all P kj > 0. Let V > 0, < R < C(V), and 5 > be arbitrary. 
Then, for all sufficiently large r, all variable-length block codes with 

• expected energy E [<S r ] < Pr + 5 

• M > exp[r(P + 5)] equiprobable messages 
must satisfy 

P e > exp{-r[E(R,V) + 5}.} (37) 

We now give an intuitive justification of the theorem.; a proof is given in the appendix. Leaving 
out the 'negligible' terms, ([32]) and (f36j) are 



ln(M) . -ln(P e ) 
" C(Px) ' " D(P 2 ) 

Defining rj = = for the given code and rearranging terms, 

^<^) ; ^^<(1-.)D(P 2 ). (38) 
r r 

For given V\,V2, and 77, (138p is the converse of Lemma EJ The exponent E(R,V) is the result of 
optimizing over these parameters for given R, V . The proof in the appendix treats the neglected 
quantities and this optimization carefully. 



4 Extension to Other Memoryless Channels 

The channel model of Sections [2] and [3] assumes finite input and output alphabets, but, as will 
be seen, the analysis is more general, and with some assumptions, continues to hold with minor 
changes such as replacing sums and max's with integrals and sup's. A later paper by BurnasheJ^I 
[3], extends his results for the DMC to more general memoryless channels, and the results to follow 
generalize this to channels with cost constraints. In this section, we specify a family of channels 
with cost constraints in which Theorems [1] and U] are valid, i.e., for which upper and lower bounds 
to the reliability function are equal to E(R, V) . 

18 The authors are grateful to the reviewers of [T] and to Peter Berlin for pointing out this reference. 
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4.1 Assumptions about the channel model 

The channel input and output alphabets can be countable or uncountably infinite and will be 
denoted by X and 3^ respectively. Each element x G X has an associated cost p x and as before we 
assume that the infimum of these costs is equal to zero. 

Each input x G X will have an associated probability measure ^ governing the output condi- 
tional on input x; this replaces the transition matrix {Pkj} for the DMC case. We will assume that 
there exists a probability measure u, with respect to which all $ x are absolutely continuous. 

Vx G X v » <& x 

Indeed without this v one can hardly begin to analyze such a memory less channel. For each x G X, 
let ifi x be the Radon-Nikodym derivative 19 ! of d x with respect to v. 

<Px = ^ (39) 
Our previous definitions can be extended by replacing sums with integrals, max's with sup's etc. 

C(V)= sup f dfJt [ i/j x ln^du (40) 

H:f xPx dn<pJx Jy Wix 

where ip^ = J x ip x d(i and ji is the unconditional probability measure on y. Similarly, 

D z =sup [ ifj x ln^dv (41) 

a£X Jy Wa 



B(V)^ sup / B x dfi (42) 

V- fx Pxdfi<V J X 

The following assumption will ensure that D('P) and C(V) are finite for all V > 0. 
Assumption 1 The discrete time memoryless channel satisfies the followinaf^ 

• Vx G X, D x . < oo and p x < oo 

• \fV > 0, A(V) 4 8uv x:px < v T) x < oo 

• limsupp^.^ < oo 

Using the above assumption in place of the DMC assumption, It is straightforward to verify that 
the proof of Theorem [T] still holds 1^1 

Proceeding on to the converse, it can be seen that the proofs of Lemmas [6] to [TO] all hold 
under Assumption [TJ In verifying these proofs, however, one must assume that all codes have 
finite expected energy; this is tacitly assumed in Theorem H] since we are assuming throughout that 
t < oo. Lemma [TT] does not hold in all cases, and in particular does not even hold for the amplitude 
limited AWGNC. The following additional assumption will hold in many cases where Lemma [TT1 
does not hold and will enable Theorem |4] to be proven. 



19 If X and y are each the set of real numbers, and if probability densities exist, then ip x (y) can be taken as the 
probability density of y conditional on x. 

20 This assumption is clearly satisfied for every DMC with all P^j > 0. If some Pkj is zero, then D('P) is infinite 
for all V > and the assumption is not satisfied. 

1 Some additionaWs are required because of the supremums in the definitions, but they can be made negligible by 
increasing £. 
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Assumption 2 The discrete-time memoryless channel has an associated function £(■) such that: 



• For any coding and any n 

E [pnH^ - ln^ n+1 ] (a) | T n \ < £(a) (1 + E [(«S n+1 - 5 n )| ^ n ]) (43) 

• lim a ^oo £(a) = 
w/iere [-] (a) = ■!{.>„}■ 

Theorem [4] is proved for stationary memoryless channels satisfying Assumptions (pQ) and ([2]) in the 
appendix. 

4.2 Discussion of extended channel models 

It is natural at this point to ask what kinds of channels satisfy Assumptions [T] and [2j A partial 
answer comes from considering the class of channels without cost constraints considered by Bur- 
nashev in [3]. He shows that any channel satisfying the following conditions has an error exponent 
given by E = (1 - R/C)B. 



D = sup a ^ x J ip a In du < oo 

(j>(a) = sup a ^ £X fipaln i, , . | di/ < oo, and lim a _ too ^(a) = 



• At least one of the following is satisfied 



— The channel is an additive noise channel whose input alphabet is a closed interval on 
the real line and whose noise has a unimodal density. 

/ , \ CM-JO 

— 3K > such that sup Q( g g ^ J ifj a (In du < 



x. 



His first assumption is that sup^g^ D x . < oo. This implies that the channel satisfies our As- 
sumption [T] for all non- negative finite cost assignments. He shows that the other assumptions imply 
that a function £(a), a > exists such that lima—^ £(a) = and such that for all codes, 

E [pnWjr, - hi^ n+1 ] (a) | JF n ] < e(a) (44) 

This implies that the channel satisfies our Assumption [2] for all non-negative finite cost assignments. 
Thus his assumptions imply our assumptions for all cost assignments and thus imply that, for all 
cost assignments, the corresponding E(R,V) exists and is the reliability function. 

We next give an example of a channel that does not satisfy the Burnashev conditions and does 
not have a finite reliability function without a cost constraint, but does satisfy our conditions and 
has a cost constrained reliability function E(R,V). 

Let X and y be the set of non-negative integers and assume the cost function p x = x 2 , i.e., the 
cost of each input letter is equal to the square of the value of the correspond real number. Let the 
transition probability P xy be 

P *y = o ( ™ 1/] 



3 V 2» 
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where S[-\ is 1 when its argument is 0, and elsewhere. 

This channel can be proved to satisfy Assumptions Q] and [2j and thus its error exponent is given 
by E(R,T > ). On the other hand it does not satisfy the necessary conditions of [3]. Furthermore, 
the reliability function is unbounded for any rate below capacity if there is no cost constraint. 

5 Conclusions 

Theorems Q] and [5] specify the reliability function for the class of variable-length block codes for 

DMC's with cost constraints, all Pkj > 0, and ideaf^l feedback. The results are extended to a more 

general class of discrete-time memoryless channels satisfying Assumptions [1] and [2] of Section 4. 

AWGNC's with amplitude and power constraints provide examples satisfying these assumptions. 

Theorem [2] shows that zero error probability is achievable at all rates up to the cost constrained 

capacity and moreover is achievable by a very simple scheme. 

The rate and the error exponent are specified in terms of the expected block length. By looking 

at a long sequence of successive message transmissions, it is evident from the law of large numbers 

that the rate corresponds to the average number of message bits transmitted per unit time. In the 

same way, the cost constraint is satisfied as an average over both time and channel behavior. The 

theorems then say essentially that the probability of error P e ,mm for the best variable-length block 

j n p 

code of given R, V, and r satisfies _ e ' mm — > E(R,V) as r — ► oo. 

Mathematically these theorems are quite similar to the conventional non-feedback block coding 
results except for the following differences: first, the reliability function is known for all rates rather 
than rates sufficiently close to capacity; second the reliability function is concave (and sometimes 
positive at capacity); and third the reliability function is given in terms of expected rather than 
actual block length. The first two differences have been discussed in detail in the previous sections. 

In order to understand the role of the expected block length on the exponent, look at the coding 
scheme used for achievability. r is close to the fixed block length of the underlying error-and-erasure 
code and it is this code that determines E(R,V). In other words, the variable-length feature is 
essential for the small error probability, but r is constant with high probability. 

One might think that a variable-length block code has many system disadvantages over a fixed- 
length code, but this is not really true (except for time-sensitive systems) since variable-length 
protocols are almost invariably used at all higher layers. As discussed in Section [2j it can be shown 
that the expected additional queuing delay introduced by those codes can be made to approach 
with increasing f . 

A Proofs 
Proof of Lemma [3t 

As can be seen from Figure El is the slope of the straight line between the points (0, 0) and 
(x,C(x)). It is constant for < x < (3 and is continuous and strictly decreasing for x > [3. Here 
(3 > is the largest x for which C(y)/y = C(x)/x for all y £ (0,x); (3 = for cases (a) and (b) of 
the figure. Letting x = V/r] shows that r]C(V/r]), as a function of rj > 0, is continuous and strictly 
increasing for r/ < V / (3 and constant for r/ > V / f3. It is equal to at rj = 0, and to C('P) at rj = 1. 

22 Indeed, as argued previously noiseless feedback of rate C or higher with bounded delay is enough 
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Since < R < C(P), there is a unique if RV G (0, 1) satisfying rf R V Q(V jif R v ) = R (and thus also 
satisfying r\* RV < V//3). Since C(0) < C(i) < C* for all x > 0, we have C(6) < C(V/r] RV ) < C*. 
Thus C(0) < Rh* R V < C* and if RV satisfies ^ < rf RV < ^j. QED 

Proof of Lemma [4} 

Note that C(x) is continuous and strictly increasing from C(0) at x = to C* at x = P* (see 
Figure [2]). Thus, for each 77 in the interval [-Mf, £57gy]> R — ??C(Pi) has a unique solution^! for 

V\ < P*. Since rj* R P G ^y], it follows that J^-p C ^y], so R = r)C(V/rj) has a unique 
solution for all rj G Ir,p, establishing the first itemized property for Zj^-p. 

Next we must show that that 77^1 < P for the V\ < P* satisfying R = r/C (Pi ) . Since 77 > rf R -p 
by assumption, the monotonicity of r]C(V/rj) implies that r]C(P/r]) > rj R/ pC(V /r] R j,). Since 
Vii-pC^/rjfip) = R, we have rjC(P/rj) > R. Since C(Pi) is strictly increasing for V\ < V* , this 
shows that Vi <V/rj for V\ < P*. 

Finally, consider 77 Ir,v- If 77 < R/C*, then no solution exists for R = r/C(Pi). If 
■M: < rj < r] R p, then the strict monotonicity of r)C(V/rj) in this range shows that rjC{V/rj) < 
ri\ v C{V/rj RV ) = R. For Pi satisfying P = r/C (Pi), then, rjC(V/rj) < ??C(Pi) and the strict 
monotonicity of C(-) shows that P/r/ < Pi, so the condition on Vi must be violated. Similarly if 
7/ £i^jy then VPi > 0, C(Pi) > R, which will violate the rate condition. QED 

Proof of Lemma [5] (Concavity of E(R, V, 77)): 

For any given DMC, let f2 be the set of triples (P, P, 77) for which < R < C(P) and 77 G Irj>. 

n = {(R, P, 77) : P > 0, < R < C(P), 7/ G Z^} 

First we show that f2 is a convex set, and then we show that E(R, P, 77) is a concave function over 
the domain Q. 

Assume that (P', V , rf) and (R", P", 77") are arbitrary points of $7. We show that $7 is a convex 
set by showing that for any a G (0, 1), the point (iJ a ,P a ,?) a ) given by 

V a = a P + (1 - a)V" R a = aR' + (1 - a)R" r) a = arj + (1 - a)r)" (45) 

is also in the set £1. 

R a is clearly positive, and using the concavity of C(-), we get 

R a < aC(P') + (1 - a)C(P") < C(aV' + (1 - a)V") = C(V a ) 

We must also show that r\ a G Z# a j> a . It suffices to show that r/ Q > rj* Ra v , r/ a < 1 and r] a < 
P Q /C(0). The latter two conditions are obvious, so we must show only that rj a > 77^ <p . As shown 
in the proof of Lemma 01 the condition 77 > 77^ -p is equivalent to R < r/C (P/r/). 

P'<r/c(f) _ _/P'\ _/P" 



R" < rfC 



P a < ar/'C (^) + (l-a)C (^j 



23 Note that for 77 = R/V* , the equation _R = r)C(V\) is also satisfied for all Pi > P*. These values of V\ are 
omitted from the optimization since they can be reduced to V* thus allowing more energy for phase 2. 



21 



Thus, 



R a < 7] a 

< VaC 
= VaC 



arj 



Va 



V'\ (l-a)if'fT" 



Tf 



Va 



arf V (1 - a)rj" V 



+ 

Va V' Va V" 

olV + (l-a)V' n 



( — ) 



Consequently £1 is a convex region. We next show that E(R, V, rf) is concave over f2. That is, given 
points (R! ,V' ,rf), (R", V" , rf') and (Ra,V a ,r] a ) in Q, we will show that L a < E(R a ,V a ,rj a ) where 
L a = aE(R', V, rf) + (1 - a)E(R", V" , rf'). 

L a = a(l - t/)D(^) + (1 - a)(l - r/')D(7>£) 



< (l-r? a )D 
= (l-^)D 

rjaD 



a{\-rf)V 2 , (l-a)(l- V ")V% 



1-Va 



+ 



1-Vc 



alV'-C-U*)) (l-a)(?"-C-Mf; 



1 - Va 



ar/C" 1 (g ) + (1 - aV'C- 1 ( ^ 



+ 



l / il" 



1 - Va 



<r/ Q D 



V a - r] a C 



1 - Va 

-1 / afl' + (l-a)fl" 



1 - Va 



E(R a ,V a ,Va)- 



The first inequality above used the concavity of D(-) combined with 1— r] a = a(l— a)(l— 77"). 
The second inequality used the convexity of C~ 1 (-) along with the fact that D(-) is non-decreasing. 

QED 

Proof of Lemma [6} 

We will first prove that E [|V n |] < 00. Recall 

V n = H Tn + nC(V) + 7c (E [<S n | F n ] - Vn) 
Since 7^ > 0, C{V) > and E [S n \ F n \ > 0, 

|V n | < \U Tn I + nC(V) + 7 S (E [S n \ Fn] + Vn) 

Note that \Hr n \ < InM. 

E [|V n |] < InM + nC(P) + ^ (E [<S„] + Pn) 
In addition for any finite energy code0 E [S n ] < 00. Consequently E [|V n |] < 00 



24 The convention for extending the encoding algorithms beyond decoding time is assigning all of the codewords to 
the same zero cost symbol. 
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We next prove that E [V n+ i | T n ] > V n . 

E [V n | T n ] = E [Hr n + nC(P) + jZ{S n - nV)\T n ] 
E [\ n+1 \T n ] = E \hr n+1 + {n + l)C{V) + 7 £(S n + P Xn+1 - (n + l)V) 
E [V n+ i| = V n + C(V) - i^V - 1(9; Y n+1 \F n = f n ) + E \p x ^ T n 



Because of the Markov relations, 9 <-> X n+ i <-> Y n+ \ which holds for all f n combined with the data 
processing inequality, we have 

E [V n+ i| F n ] > V n + C(V) - 7 gP - I(X n+l -Y n+l \T n = f n ) + 7 £E [p Xn+1 1 .F„ 

Note that 

C(V) - jZP = max \3 (p) - 7 S E , 

where 3 (99) is the mutual information corresponding to the input distribution tp. Thus 

E[V n+1 j T n \ > V n 

and the stochastic sequence {V n ,n} is a submartingale. QED 
Proof of Lemma O 

We use the following shorthand for a given f n - 

P(i) = Pi(fn) = P [6* = i| y n+ i = j, J" n = f n ] 

<p(k\i) = P [X n+1 = h\T n = f n , 9 = i] <p(k) = P [X n+1 = k\T n = f n ] 

= P [Y n+ l = j\ T n = f n , 9 = i] = P [Y n +1 = j\ ?n = fn] 

We proceed to upper bound E [m?^ n — lnH^ n+1 1 T n = f n ] . 

\y\ 

B[hxHjr n -hxHj^ 1 \^ n = f n ] 



i=i 

<- E 



E£iKi|j)ln 



lnp(z) 



Em -p(m)lnp(m; 



p(i) 



where we have used the log-sum inequality for each j above. 



Usillg pW> = Wk definin S V'(jK) = P [Y n +l=j\ Fn=fn, 9+i], 



E[ln^„-ln^ n+1 |^ n = f n ] 



3=1 

5 E 



Ei=iP(^b') ln 



where we have used the log-sum inequality for each j above. 



P(i\j) 
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Using Jgy = ^ and denning = P [r n+1 =j| F n =f n , 9&] 

Mj\ i n p( i ) ln ^( i ) = ln ( jjU) ln p(^) A 



< p(0 E 1,1 Hfjij + E *OI J ) 1,1 fjf§ < 46 > 

In order to verify the inequality in (|46f) . denote the right side minus the left side as A, and note 
that by substitution, 

A= P(*)£^(il01n^ +U-P«)E^)ln^ 



+ 



The first two terms above are divergences, and thus non-negative. The third term can be 
rewritten as below and is shown to be nonnegative by applying Jensen's inequality to the function 
ln (- ln(l + ax)) which is convex for any a > 0. 

1 ( 1 _i_ ip(j\i) 

ip(j\i)hxp(i\j)\ *l>U\i) V + p(0 



3 3 \ p(t) J 

Similarly, the fourth term can be rewritten as below and is shown to be nonnegative by applying 
Jensen's inequality to the convex function ln (ln(l + ^)) for a > 0. 

In (l + 1_p ^ ^'1^ 



ln I 1 + w 

This verifies (146 p . The final term in (I46p can be upper bounded by 



E*i^S£ = E E f( k ® p *> 



(6) 

< e^i*)^- 

fe 

where (a) uses the log sum inequality over k for each j and (6) follows from the definition of D/%. 
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By a similar argument on the first term in (|46p . 



Substituting the above two inequalities into ([461) . 

^V(j)ln- PWlnpW 

Thus 



p(i|j)lnp(i|i) 



< 



J>(fc)D fe . 



E [lnH^ - ln^ n+1 |^ n = fn] <£ ^Pfe = E p Xn+1 \ T n = f n ] 



(47) 
QED 



which is equivalent to ([23]) . 
Proof of Lemma [8l 

We will first prove that E [W n+ i | .F n ] > W„. Recalling that W n = lnH^ n + nD(V) + 
7^(E [S n \ T n \ — riP) and using Lemma 

E [W n+1 1 JT n ] > W n + D('P) — E [D Xn+1 1 JP n ] - 7 £ (E [p Xn+1 1 T n ] - V) 
> W n 

We next prove that E [|W n |] < oo. Using the definition of W n and the fact that 7^ > 0, D(P) > 
and E[5 n |^ n ] > 0, 

|W n | < |ln^J + nD(V) + 7S (E [S n \ T n \ + Vn) 



Note that since Hjr n < Hr = InM. 



E[|W„|] < In M + E 



In 



+ nD(P)+ 7 £(E [S n ]+Vn) 



Since for any finite energy coda I E [<S n ] < 00, proving that E 
00. 

Note that 



In 



< 00, will prove E [|W n |] < 



E 



1 ^0 


= E 







n nj 



_k=l ' 

Using the convexity of ~D(V) function together with equation (147j) we get 



E 



£ ln 



< nE 



D 



E[S„] 



r? 



ln^ 



< 00 and thus E [\W%\] < 00. 



QED 



Recalling that E [S n ] < 00, will prove E 
Proof of Lemma [9j 

25 Recall the convention of extending the encoding algorithm beyond decoding time by assigning all of the codewords 
to the same zero cost symbol. 
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By the definition of V n , 

v Ti = u Tt% + nC(V) + il (e [s Ti | r n ] - v n ) 

Since the expected value of each term on the right side exists, 

E[V£] = v[Hr T ]+V[T i ]C{V)+-< v c E[S Ti -Vn 
< v\h Tt ] +B[ n ]C(V), 



(48) 
(49) 



where we have used 7^ > along with the hypothesis of the lemma that E [S Ti ] < "PE [n]. Since 
E [Vf] = E [Hj? ] = InM, the result of the lemma, i.e., C(P)E [n] > E [H? ] - E Ht t . will then 
hold if 

E[V r J>E[V ] (50) 

holds. Doob's theorenl^l states that a submartingale V n satisfies (|50p if it satisfies the following 
two conditions: 

E [|V n |] < 00 and lim E [|V n |I {r . >n} ] = (51) 

n— »oo 

The first condition follows from modifying (|48p to bound |V^|. 

I I < + TiC(P) + 7 S (E [5 Tl I Fn] + Pr*) 
To establish the second condition, let 



£n = |V n | I{ Ti >n} 

= |Wjr n + nC(7>) + 7 £E [5 n | T n \ - 7c Tn| I {Ti > n} 
< \U Tn + nC(V) + 7 SE [S n \ T n ) + 7c Tn] l {n > n} 



(52) 



(53) 



We want to find a random variable £ of finite expectation that upper bounds £ n for each ra; the 
troublesome term here is E[5 n | .?-"„]. Let S n {m) (which is measurable in J- n ) be the cost of the 
codeword corresponding to the message m at time n. The following very weak bound is sufficient 
for our purposes. 



M 



m=l 



E [<S n | T n \ = P i°= m \ Fn] S n (m) <J2S n ( 
Substituting flM]) into fl53 

InM + nC{V) + 7? ^ S n {m) + y%V 

m=l 
M 

InM + n C{V) + 7c S -^ m ) + "£ Pr < 



(54) 



(a) 
< 



m=l 



-71 



< 



m=l 

M 



{n>n} 



In M + 7iC(P) + 7S E 5 ^ M + 



m=l 



< [In M + Ti C(7>) + 7 £ME [«S T J + 7 M 4 C 



(55) 



3 See, for example, Shiryaev, [17] , page 457. 
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In (a) we have used the fact that indicator function is zero if Tj < n; and S n (m) > S n {m) if Tj > re. 
Note that < £ n < ( for all n. Since E [n] < oo and E [<S T J < VE [n] < oo , E [C] < oo. Since 
lhxin^oo P [£ n = 0] = 1, Lebesgue's dominated convergence theorem shows that linin^oo E [£ n ] = 0. 

QED 



Proof of Lemma 1101 

Lemma [8] showed that the sequence 



W n = ln^„ + nD(P) + 7 S(E [<S n | T n \ - nV) 



(56) 



for n > is a submartingale, and we will use Doob's theorem to prove the lemma. In particular, 
for two stopping times, < Tf, Doob's theorem says that if, for both s = i and s = /, 



E [|W T J] < oo and lim E [|W n |I {rs>n} ] = 

then E [W r J and E [W r ,] exist and satisfy 

E[W T/ ] >E[W r J 



(57) 



(58) 



For the moment, assume that the condition of (|57p is satisfied. Then substituting the definition of 
W n for n = Ti and n = Tf into (158p . 



E 



+ E>(V)E [ Tf - n] + 7 £ (E [S Tf - S Ti \ - VE [T f - Ti\) > 



Inserting the assumption E [<S T/ — <S Ti ] < VE [Tf — n], 



E 



In- 



+ E>(V)E[T f -n] > 



This is equivalent to the result of the lemma, so we need only establish the condition in (|57p to 
complete the proof. For the first part, we can modify (|56p to bound |W Ts | as 

|W Ts | < |ln^ Ts | + t s E>(V) + 7 £(E [S Ts | T T .\ + t s V) 

All but the first of these terms clearly have finite expectations, so the first part of (|57p reduces to 
proving that E [|ln7^jr Ts |] < oo. Since Hjr = InM, 



< 



Ts— 1 n_i 

In ~~~~~~ 4 

n=0 

T S -1 

^ hi '" J ' n + |lnlnM| 
n=i n ^+i 



Now using Lemma [3 we have 



E 




■F re 









<E[E> Xn+1 \F n ] 
<E>(E[S n+1 -S n \F n }) 



(59) 

(60) 
(61) 
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where the second inequality follows from the concavity of the function D(-). 

For any random variable v, we have v = vl{ v> $} + vl^ v< ^ and \v\ = vI{ v> q} — fI{„ < o}- 
Combining these, \v\ = v — 2vI{ v< q}. Applying this to to the random variable \n.7ijr n — \nTi,j^ n+1 , 



E 



hi 



The last term above can be bounded as 



<D(E[S n+1 -S n \F n ])-2E 



In 



E 



1 ^^n + l it 





= E 











In. 



J~ n. 



(a) 

< eE 



< eE 

(6) 

< e 



(62) 



where in (a) we have used the fact that — x In x < e for all x > and in (6) we used E [Hp n+1 \ < 

n Tn . Thus, 



E 



[in 













< D(E[«S n+ i-«S n |5 n ]) + 2e 

< D(0) + 2e + D'(0)E[5 n+ i -«S„|^„] 
where D'(0) = -jp~D(V)\ v=Q . Substituting this into the expectation of ([59]) . 

E [|laW*v, |] < D(0)E [ T J + D'(0)E [Sr.] + 2eE [r s ] + |lnlnM| 



(63) 



This is finite, verifying the first part of the condition in (1571) . 

Finally we must verify the second part of the condition, i.e., that 



Let £' n be 



lim E [|W n |I {Ts>n} ] =0 



C ± |w n |i {Ts > n} 

< [\hxHjr n \ + nT>{V) + 7 £ (E [<S n | T n ] + Vn)] l {Ts > n} 
lnM 



< 



< 



In 



+ |lnlnM| +nD(P) + 7 £ (E [«S n | .F n ] + Vn) 



H{T s >n} 



n-l 

£ 

n=0 



In 



+ |lnlnM| + nD(P) + 7 £ ( E [S n \Fn] + Vn) 



l {r s >n} 



Following the same set of steps as in ([55]) . 



n=0 



In 



+ |lnlnM| + t s D(V) + 7 £ (ME [5 T J + Pr s ) 4 £' 
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Taking the expectation of both sides, using ([63]) . and using the hypotheses E [r] < oo and E [<S r ] < 
oo, we see that E [£'] < oo. Thus, using Lebesgue's dominated convergence theorem, together with 
the fact that lim n ^oo P [£' n = 0] = 1 implies that lim^oo E [£' n ] = 0. QED 

Proof of Lemma lilt 

We use the same shorthand notation as in the proof of LemmaOto upper bound In Ttj n —In T~lf n+1 ■ 
Let Y n+1 = j. 

M ^ M ^ 

lnH fn =ln^p(i)ln^y lnW fn+1 = hi J^p(i|j) ln^y 

< max — J - (64) 



Note that 



where we have used the fact that both and ip(j\i) are in the convex hull of the set of transition 
probabilities Pkj- Using the non- negativity of the divergence followed by ([61 



y^p(i) In —5— < y^p(i) In —^-—r < (max — ^- ) ^ p(i\j) In - 

Including j in the maximization above, this is valid for all possible outputs Y n+ \ and is thus 
equivalent to ([35]) . QED 

Proof of the Converse, Theorem [4} 

Theorem [4] will be proved for the discrete-time memoryless channels defined in Section [4] This 
includes DMC's with Pjy > for all k,j as a special case. The discussion in Subsection 13.4.11 is 
valid except for \n7ip T > — F and the consequent inequality ([36]) . As a substitute, we will use 
Assumption [2] of Section H] to show that E [r — t\\ can be lower bounded, for each A > 0, by 

. -lnP e -lnflnM-lnP e + ll - A - £(A)(1 + P)E M , s 

VA>0E[r-n]> e - l - '—^ (65) 

Proof of equation [65] will be presented subsequent to the current proof. 

Assume that the theorem is false. Then a sequence of codes, indexed by superscript i, exists 
such that the durations r* satisfy lim E \t 1 ] = oo and each code satisfies all the conditions above 

eP 



but violates (|37p . Define rf = jjfc?-- Then using (|32p for the ith code and dividing both sides by 
E [t 1 ] , we get 

(In AT) [l - P^lnAP - InPJ + 1) 



E[r*]C(^) 
(12 + 5) [1 - P] (In M* - In PJ + 1) - ^ 



> 



c(pi; 



The term in brackets above approaches 1 since PI < exp{-E [r*] [E(R,V] + 5]} and ln(AP) lies 
between E [t*] (P + 5) and E [r*] C(P). Thus 

P 

rf > 7^7 — jt for sufficiently large i, (66) 
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Similarly, dividing both sides of (|65p by E [r l ] , 

ln(i») - lnflnM — In P* + 1] — A — (1 + P)£(A)E [r] 



1 - rf > 



> 



(£(P,P)+5) 



E[r]D(P|) 

-. _ ln[lnM-lnP e '+l] " 
In(l/J5) 



E[r«] 



(i+p)e(A) 



Let A* be such that (1 + 7>)£(A*) < 5/4. Then for sufficiently large 

E(R,V) +5/2 



l-rf> 



D(P. 



(67) 



Prom (I66p . > C 1 (i?/r/ 1 ), so from the energy constraint E [«S T ] < PE [r] + 5, we have 



K < 



Thus, using (|67p and (|68p for large enough r l 



1 — rf 



+ 5/4 



E(R,P) + S/2 < (l-i/)D 

\ 1—7? / 

< £(P,P)+<5/4, (69) 

where the definition of -E^P, P) in (|12p is used to establish a contradiction. QED 
Proof of Inequality I65t 

For any channel and code, let v n be the random variable v n = \n7i^ n — \sxTlj: . Assumption 
[2] of section H] asserts that there is a function £(A) satisfying limA-*oo £(A) = such that for all n 
and A > 0, 

E [v n I {Vn > Ay \ T n ] < £(A)(1 + E [S n+l - S n \ F n \) (70) 

For all sample values f n of T n such that Tit > 1, we see that v n > — hi7ij^ n+1 . It follows from this 
that I{^ n >A} > ^{-lnW^ >A} anc ^ ^hus that 

^rJ{„ n >A} > _lnW J r n+l ][ {-hiW^ n+1 >A} 

Substituting this into (|70p for all such f n , 



E 



InW^^.^w > A} T n = f n \ < e(A)(l + E [S n+1 - 5 n | Jf n = f n ]) 



(71) 



Note that Tip„ > 1 holds, and thus (|71|) also holds, for each f n such that n < t\. Thus, ()7ip will 
hold for all f n if the indicator function Ir n<n i is inserted on both sides, i.e., 



E 



< ^, I +i II {-lnH^ n+1 >A} II {n<n} < £(A)E [(1 + 5 n+ i - 5 n )I{ n<T1 }| T n 

-ln^ +1 Ir ln >A >I { n<n}l < e(A)E[(l + 5„ +1 -5 n )I {n<Tl} ]) 

I •'n+l — J J 



(72) 
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where we have taken the expected value over T n on both sides. Note that 

I {-lnH Tn+1 >A} I {n<r 1 } = I {-lnH Tn+1 >A} I {n+l=T 1 } 



<e(A)E[min(/c,r 1 )+5 min((tiTl) ] 



For any k > 0,we next sum (|72p over < n < k. 



E 

Using 



l -^min(fc,T 1 )— J 



together with E [| ln7i^ Ti |] < oo, E [t\] < oo and Lebesgue's dominated convergence theorem, one 
can show that, 



E 

Consequently 



^n^^expC-A)} 



>-C(A)(E[r 1 ]+E[5 Tl ]) (73) 



E [mT^vJ > -A - e(A)(E [ n ] + E [«S T1 ]) (74) 
Now recall the inequality (f34"j) 

E [ln^ T ] < ln[P e (lnM - lnP e + 1)] 

Using the Lemma [TOl we get 

E [t - n] > E [lnWj^ - ln?^ T ] B(V 2 ) 

lnP e - lnpnM - In P e + 1] — A — £(A)(E [n] + E [S T1 ]) 



> 



> 



D(P 2 ) 

lnP e - ln[lnM -lnP e + 1] - A- f(A)(l + P)E [r] 
D(7> 2 ) 



QED 
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